4 research outputs found

    A novel framework for high-quality voice source analysis and synthesis

    Get PDF
    The analysis, parameterization and modeling of voice source estimates obtained via inverse filtering of recorded speech are some of the most challenging areas of speech processing owing to the fact humans produce a wide range of voice source realizations and that the voice source estimates commonly contain artifacts due to the non-linear time-varying source-filter coupling. Currently, the most widely adopted representation of voice source signal is Liljencrants-Fant's (LF) model which was developed in late 1985. Due to the overly simplistic interpretation of voice source dynamics, LF model can not represent the fine temporal structure of glottal flow derivative realizations nor can it carry the sufficient spectral richness to facilitate a truly natural sounding speech synthesis. In this thesis we have introduced Characteristic Glottal Pulse Waveform Parameterization and Modeling (CGPWPM) which constitutes an entirely novel framework for voice source analysis, parameterization and reconstruction. In comparative evaluation of CGPWPM and LF model we have demonstrated that the proposed method is able to preserve higher levels of speaker dependant information from the voice source estimates and realize a more natural sounding speech synthesis. In general, we have shown that CGPWPM-based speech synthesis rates highly on the scale of absolute perceptual acceptability and that speech signals are faithfully reconstructed on consistent basis, across speakers, gender. We have applied CGPWPM to voice quality profiling and text-independent voice quality conversion method. The proposed voice conversion method is able to achieve the desired perceptual effects and the modified speech remained as natural sounding and intelligible as natural speech. In this thesis, we have also developed an optimal wavelet thresholding strategy for voice source signals which is able to suppress aspiration noise and still retain both the slow and the rapid variations in the voice source estimate.EThOS - Electronic Theses Online ServiceGBUnited Kingdo

    Application of Artificial Neural Network for Image Noise Level Estimation in the SVD domain

    No full text
    The blind additive white Gaussian noise level estimation is an important and a challenging area of digital image processing with numerous applications including image denoising and image segmentation. In this paper, a novel block-based noise level estimation algorithm is proposed. The algorithm relies on the artificial neural network to perform a complex image patch analysis in the singular value decomposition (SVD) domain and to evaluate noise level estimates. The algorithm exhibits the capacity to adjust the effective singular value tail length with respect to the observed noise levels. The results of comparative analysis show that the proposed ANN-based algorithm outperforms the alternative single stage block-based noise level estimating algorithm in the SVD domain in terms of mean square error (MSE) and average error for all considered choices of block size. The most significant improvements in MSE levels are obtained at low noise levels. For some test images, such as “Car„ and “Girlface„, at σ = 1 , these improvements can be as high as 99% and 98.5%, respectively. In addition, the proposed algorithm eliminates the error-prone manual parameter fine-tuning and automates the entire noise level estimation process
    corecore